{
  "cells": [
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "l0Y7_lgN4jzM"
      },
      "source": [
        "# **Bioinformatics Project - Computational Drug Discovery [Part 3] Descriptor Calculation and Dataset Preparation(Modified Version towards unknown smiles predcition purpose)**\n",
        "\n",
        "Chanin Nantasenamat\n",
        "\n",
        "[*'Data Professor' YouTube channel*](http://youtube.com/dataprofessor)\n",
        "\n",
        "Modified by quantaosun@gmail.com\n",
        "\n",
        "\n",
        "In **Part 3**, we will be calculating molecular descriptors that are essentially quantitative description of the compounds in the dataset. Finally, we will be preparing this into a dataset for subsequent model prediction in notebook 5.\n",
        "\n",
        "---"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "o-4IOizard4P"
      },
      "source": [
        "## **Download PaDEL-Descriptor**"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 1,
      "metadata": {
        "colab": {
          "base_uri": "https://localhost:8080/"
        },
        "id": "H0mjQ2PcrSe5",
        "outputId": "2cb2b1f7-5770-40a3-fd2e-0f3c31ba8db1"
      },
      "outputs": [
        {
          "output_type": "stream",
          "name": "stdout",
          "text": [
            "--2022-01-09 12:11:01--  https://github.com/dataprofessor/bioinformatics/raw/master/padel.zip\n",
            "Resolving github.com (github.com)... 140.82.121.3\n",
            "Connecting to github.com (github.com)|140.82.121.3|:443... connected.\n",
            "HTTP request sent, awaiting response... 302 Found\n",
            "Location: https://raw.githubusercontent.com/dataprofessor/bioinformatics/master/padel.zip [following]\n",
            "--2022-01-09 12:11:01--  https://raw.githubusercontent.com/dataprofessor/bioinformatics/master/padel.zip\n",
            "Resolving raw.githubusercontent.com (raw.githubusercontent.com)... 185.199.108.133, 185.199.109.133, 185.199.110.133, ...\n",
            "Connecting to raw.githubusercontent.com (raw.githubusercontent.com)|185.199.108.133|:443... connected.\n",
            "HTTP request sent, awaiting response... 200 OK\n",
            "Length: 25768637 (25M) [application/zip]\n",
            "Saving to: ‘padel.zip’\n",
            "\n",
            "padel.zip           100%[===================>]  24.57M   163MB/s    in 0.2s    \n",
            "\n",
            "2022-01-09 12:11:02 (163 MB/s) - ‘padel.zip’ saved [25768637/25768637]\n",
            "\n",
            "--2022-01-09 12:11:02--  https://github.com/dataprofessor/bioinformatics/raw/master/padel.sh\n",
            "Resolving github.com (github.com)... 140.82.113.3\n",
            "Connecting to github.com (github.com)|140.82.113.3|:443... connected.\n",
            "HTTP request sent, awaiting response... 302 Found\n",
            "Location: https://raw.githubusercontent.com/dataprofessor/bioinformatics/master/padel.sh [following]\n",
            "--2022-01-09 12:11:02--  https://raw.githubusercontent.com/dataprofessor/bioinformatics/master/padel.sh\n",
            "Resolving raw.githubusercontent.com (raw.githubusercontent.com)... 185.199.108.133, 185.199.109.133, 185.199.110.133, ...\n",
            "Connecting to raw.githubusercontent.com (raw.githubusercontent.com)|185.199.108.133|:443... connected.\n",
            "HTTP request sent, awaiting response... 200 OK\n",
            "Length: 231 [text/plain]\n",
            "Saving to: ‘padel.sh’\n",
            "\n",
            "padel.sh            100%[===================>]     231  --.-KB/s    in 0s      \n",
            "\n",
            "2022-01-09 12:11:02 (9.03 MB/s) - ‘padel.sh’ saved [231/231]\n",
            "\n"
          ]
        }
      ],
      "source": [
        "! wget https://github.com/dataprofessor/bioinformatics/raw/master/padel.zip\n",
        "! wget https://github.com/dataprofessor/bioinformatics/raw/master/padel.sh"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 2,
      "metadata": {
        "colab": {
          "base_uri": "https://localhost:8080/"
        },
        "id": "64HnTL4tS-nA",
        "outputId": "c54ab938-5f82-4389-fc13-7f9e7324af34"
      },
      "outputs": [
        {
          "output_type": "stream",
          "name": "stdout",
          "text": [
            "Archive:  padel.zip\n",
            "   creating: PaDEL-Descriptor/\n",
            "  inflating: __MACOSX/._PaDEL-Descriptor  \n",
            "  inflating: PaDEL-Descriptor/MACCSFingerprinter.xml  \n",
            "  inflating: __MACOSX/PaDEL-Descriptor/._MACCSFingerprinter.xml  \n",
            "  inflating: PaDEL-Descriptor/AtomPairs2DFingerprinter.xml  \n",
            "  inflating: __MACOSX/PaDEL-Descriptor/._AtomPairs2DFingerprinter.xml  \n",
            "  inflating: PaDEL-Descriptor/EStateFingerprinter.xml  \n",
            "  inflating: __MACOSX/PaDEL-Descriptor/._EStateFingerprinter.xml  \n",
            "  inflating: PaDEL-Descriptor/Fingerprinter.xml  \n",
            "  inflating: __MACOSX/PaDEL-Descriptor/._Fingerprinter.xml  \n",
            "  inflating: PaDEL-Descriptor/.DS_Store  \n",
            "  inflating: __MACOSX/PaDEL-Descriptor/._.DS_Store  \n",
            "   creating: PaDEL-Descriptor/license/\n",
            "  inflating: __MACOSX/PaDEL-Descriptor/._license  \n",
            "  inflating: PaDEL-Descriptor/KlekotaRothFingerprintCount.xml  \n",
            "  inflating: __MACOSX/PaDEL-Descriptor/._KlekotaRothFingerprintCount.xml  \n",
            "  inflating: PaDEL-Descriptor/config  \n",
            "  inflating: __MACOSX/PaDEL-Descriptor/._config  \n",
            "  inflating: PaDEL-Descriptor/PubchemFingerprinter.xml  \n",
            "  inflating: __MACOSX/PaDEL-Descriptor/._PubchemFingerprinter.xml  \n",
            "  inflating: PaDEL-Descriptor/ExtendedFingerprinter.xml  \n",
            "  inflating: __MACOSX/PaDEL-Descriptor/._ExtendedFingerprinter.xml  \n",
            "  inflating: PaDEL-Descriptor/KlekotaRothFingerprinter.xml  \n",
            "  inflating: __MACOSX/PaDEL-Descriptor/._KlekotaRothFingerprinter.xml  \n",
            "  inflating: PaDEL-Descriptor/GraphOnlyFingerprinter.xml  \n",
            "  inflating: __MACOSX/PaDEL-Descriptor/._GraphOnlyFingerprinter.xml  \n",
            "  inflating: PaDEL-Descriptor/SubstructureFingerprinter.xml  \n",
            "  inflating: __MACOSX/PaDEL-Descriptor/._SubstructureFingerprinter.xml  \n",
            "  inflating: PaDEL-Descriptor/Descriptors.xls  \n",
            "  inflating: __MACOSX/PaDEL-Descriptor/._Descriptors.xls  \n",
            "   creating: PaDEL-Descriptor/lib/\n",
            "  inflating: __MACOSX/PaDEL-Descriptor/._lib  \n",
            "  inflating: PaDEL-Descriptor/PaDEL-Descriptor.jar  \n",
            "  inflating: __MACOSX/PaDEL-Descriptor/._PaDEL-Descriptor.jar  \n",
            "  inflating: PaDEL-Descriptor/SubstructureFingerprintCount.xml  \n",
            "  inflating: __MACOSX/PaDEL-Descriptor/._SubstructureFingerprintCount.xml  \n",
            "  inflating: PaDEL-Descriptor/AtomPairs2DFingerprintCount.xml  \n",
            "  inflating: __MACOSX/PaDEL-Descriptor/._AtomPairs2DFingerprintCount.xml  \n",
            "  inflating: PaDEL-Descriptor/descriptors.xml  \n",
            "  inflating: __MACOSX/PaDEL-Descriptor/._descriptors.xml  \n",
            "  inflating: PaDEL-Descriptor/license/lgpl-2.1.txt  \n",
            "  inflating: __MACOSX/PaDEL-Descriptor/license/._lgpl-2.1.txt  \n",
            "  inflating: PaDEL-Descriptor/license/LICENSE.txt  \n",
            "  inflating: __MACOSX/PaDEL-Descriptor/license/._LICENSE.txt  \n",
            "  inflating: PaDEL-Descriptor/license/README - CDK  \n",
            "  inflating: __MACOSX/PaDEL-Descriptor/license/._README - CDK  \n",
            "  inflating: PaDEL-Descriptor/license/lgpl.license  \n",
            "  inflating: __MACOSX/PaDEL-Descriptor/license/._lgpl.license  \n",
            "  inflating: PaDEL-Descriptor/lib/ambit2-core-2.4.7-SNAPSHOT(3).jar  \n",
            "  inflating: __MACOSX/PaDEL-Descriptor/lib/._ambit2-core-2.4.7-SNAPSHOT(3).jar  \n",
            "  inflating: PaDEL-Descriptor/lib/libPaDEL-Jobs(6).jar  \n",
            "  inflating: __MACOSX/PaDEL-Descriptor/lib/._libPaDEL-Jobs(6).jar  \n",
            "  inflating: PaDEL-Descriptor/lib/libPaDEL.jar  \n",
            "  inflating: __MACOSX/PaDEL-Descriptor/lib/._libPaDEL.jar  \n",
            "  inflating: PaDEL-Descriptor/lib/jgrapht-0.6.0(4).jar  \n",
            "  inflating: __MACOSX/PaDEL-Descriptor/lib/._jgrapht-0.6.0(4).jar  \n",
            "  inflating: PaDEL-Descriptor/lib/commons-cli-1.2(2).jar  \n",
            "  inflating: __MACOSX/PaDEL-Descriptor/lib/._commons-cli-1.2(2).jar  \n",
            "  inflating: PaDEL-Descriptor/lib/xom-1.1(1).jar  \n",
            "  inflating: __MACOSX/PaDEL-Descriptor/lib/._xom-1.1(1).jar  \n",
            "  inflating: PaDEL-Descriptor/lib/swing-worker-1.1.jar  \n",
            "  inflating: __MACOSX/PaDEL-Descriptor/lib/._swing-worker-1.1.jar  \n",
            "  inflating: PaDEL-Descriptor/lib/commons-cli-1.2(3).jar  \n",
            "  inflating: __MACOSX/PaDEL-Descriptor/lib/._commons-cli-1.2(3).jar  \n",
            "  inflating: PaDEL-Descriptor/lib/jgrapht-0.6.0(5).jar  \n",
            "  inflating: __MACOSX/PaDEL-Descriptor/lib/._jgrapht-0.6.0(5).jar  \n",
            "  inflating: PaDEL-Descriptor/lib/jama(1).jar  \n",
            "  inflating: __MACOSX/PaDEL-Descriptor/lib/._jama(1).jar  \n",
            "  inflating: PaDEL-Descriptor/lib/appframework-1.0.3.jar  \n",
            "  inflating: __MACOSX/PaDEL-Descriptor/lib/._appframework-1.0.3.jar  \n",
            "  inflating: PaDEL-Descriptor/lib/libPaDEL-Jobs(7).jar  \n",
            "  inflating: __MACOSX/PaDEL-Descriptor/lib/._libPaDEL-Jobs(7).jar  \n",
            "  inflating: PaDEL-Descriptor/lib/vecmath1.2-1.14.jar  \n",
            "  inflating: __MACOSX/PaDEL-Descriptor/lib/._vecmath1.2-1.14.jar  \n",
            "  inflating: PaDEL-Descriptor/lib/ambit2-smarts-2.4.7-SNAPSHOT(6).jar  \n",
            "  inflating: __MACOSX/PaDEL-Descriptor/lib/._ambit2-smarts-2.4.7-SNAPSHOT(6).jar  \n",
            "  inflating: PaDEL-Descriptor/lib/ambit2-core-2.4.7-SNAPSHOT(2).jar  \n",
            "  inflating: __MACOSX/PaDEL-Descriptor/lib/._ambit2-core-2.4.7-SNAPSHOT(2).jar  \n",
            "  inflating: PaDEL-Descriptor/lib/jama(6).jar  \n",
            "  inflating: __MACOSX/PaDEL-Descriptor/lib/._jama(6).jar  \n",
            "  inflating: PaDEL-Descriptor/lib/jgrapht-0.6.0(2).jar  \n",
            "  inflating: __MACOSX/PaDEL-Descriptor/lib/._jgrapht-0.6.0(2).jar  \n",
            "  inflating: PaDEL-Descriptor/lib/libPaDEL-Descriptor(3).jar  \n",
            "  inflating: __MACOSX/PaDEL-Descriptor/lib/._libPaDEL-Descriptor(3).jar  \n",
            "  inflating: PaDEL-Descriptor/lib/commons-cli-1.2(4).jar  \n",
            "  inflating: __MACOSX/PaDEL-Descriptor/lib/._commons-cli-1.2(4).jar  \n",
            "  inflating: PaDEL-Descriptor/lib/ambit2-base-2.4.7-SNAPSHOT.jar  \n",
            "  inflating: __MACOSX/PaDEL-Descriptor/lib/._ambit2-base-2.4.7-SNAPSHOT.jar  \n",
            "  inflating: PaDEL-Descriptor/lib/ambit2-smarts-2.4.7-SNAPSHOT(1).jar  \n",
            "  inflating: __MACOSX/PaDEL-Descriptor/lib/._ambit2-smarts-2.4.7-SNAPSHOT(1).jar  \n",
            "  inflating: PaDEL-Descriptor/lib/commons-cli-1.2.jar  \n",
            "  inflating: __MACOSX/PaDEL-Descriptor/lib/._commons-cli-1.2.jar  \n",
            "  inflating: PaDEL-Descriptor/lib/commons-cli-1.2(8).jar  \n",
            "  inflating: __MACOSX/PaDEL-Descriptor/lib/._commons-cli-1.2(8).jar  \n",
            "  inflating: PaDEL-Descriptor/lib/jgrapht-0.6.0.jar  \n",
            "  inflating: __MACOSX/PaDEL-Descriptor/lib/._jgrapht-0.6.0.jar  \n",
            "  inflating: PaDEL-Descriptor/lib/libPaDEL-Jobs(1).jar  \n",
            "  inflating: __MACOSX/PaDEL-Descriptor/lib/._libPaDEL-Jobs(1).jar  \n",
            "  inflating: PaDEL-Descriptor/lib/libPaDEL-Jobs.jar  \n",
            "  inflating: __MACOSX/PaDEL-Descriptor/lib/._libPaDEL-Jobs.jar  \n",
            "  inflating: PaDEL-Descriptor/lib/ambit2-core-2.4.7-SNAPSHOT(4).jar  \n",
            "  inflating: __MACOSX/PaDEL-Descriptor/lib/._ambit2-core-2.4.7-SNAPSHOT(4).jar  \n",
            "  inflating: PaDEL-Descriptor/lib/xom-1.1.jar  \n",
            "  inflating: __MACOSX/PaDEL-Descriptor/lib/._xom-1.1.jar  \n",
            "  inflating: PaDEL-Descriptor/lib/commons-cli-1.2(5).jar  \n",
            "  inflating: __MACOSX/PaDEL-Descriptor/lib/._commons-cli-1.2(5).jar  \n",
            "  inflating: PaDEL-Descriptor/lib/libPaDEL-Descriptor(2).jar  \n",
            "  inflating: __MACOSX/PaDEL-Descriptor/lib/._libPaDEL-Descriptor(2).jar  \n",
            "  inflating: PaDEL-Descriptor/lib/jgrapht-0.6.0(3).jar  \n",
            "  inflating: __MACOSX/PaDEL-Descriptor/lib/._jgrapht-0.6.0(3).jar  \n",
            "  inflating: PaDEL-Descriptor/lib/jama(7).jar  \n",
            "  inflating: __MACOSX/PaDEL-Descriptor/lib/._jama(7).jar  \n",
            "  inflating: PaDEL-Descriptor/lib/ambit2-core-2.4.7-SNAPSHOT.jar  \n",
            "  inflating: __MACOSX/PaDEL-Descriptor/lib/._ambit2-core-2.4.7-SNAPSHOT.jar  \n",
            "  inflating: PaDEL-Descriptor/lib/commons-cli-1.2(6).jar  \n",
            "  inflating: __MACOSX/PaDEL-Descriptor/lib/._commons-cli-1.2(6).jar  \n",
            "  inflating: PaDEL-Descriptor/lib/libPaDEL-Descriptor(1).jar  \n",
            "  inflating: __MACOSX/PaDEL-Descriptor/lib/._libPaDEL-Descriptor(1).jar  \n",
            "  inflating: PaDEL-Descriptor/lib/jama(4).jar  \n",
            "  inflating: __MACOSX/PaDEL-Descriptor/lib/._jama(4).jar  \n",
            "  inflating: PaDEL-Descriptor/lib/libPaDEL-Jobs(2).jar  \n",
            "  inflating: __MACOSX/PaDEL-Descriptor/lib/._libPaDEL-Jobs(2).jar  \n",
            "  inflating: PaDEL-Descriptor/lib/ambit2-smarts-2.4.7-SNAPSHOT(3).jar  \n",
            "  inflating: __MACOSX/PaDEL-Descriptor/lib/._ambit2-smarts-2.4.7-SNAPSHOT(3).jar  \n",
            "  inflating: PaDEL-Descriptor/lib/ambit2-smarts-2.4.7-SNAPSHOT(2).jar  \n",
            "  inflating: __MACOSX/PaDEL-Descriptor/lib/._ambit2-smarts-2.4.7-SNAPSHOT(2).jar  \n",
            "  inflating: PaDEL-Descriptor/lib/ambit2-smarts-2.4.7-SNAPSHOT.jar  \n",
            "  inflating: __MACOSX/PaDEL-Descriptor/lib/._ambit2-smarts-2.4.7-SNAPSHOT.jar  \n",
            "  inflating: PaDEL-Descriptor/lib/libPaDEL-Jobs(3).jar  \n",
            "  inflating: __MACOSX/PaDEL-Descriptor/lib/._libPaDEL-Jobs(3).jar  \n",
            "  inflating: PaDEL-Descriptor/lib/l2fprod-common-all(1).jar  \n",
            "  inflating: __MACOSX/PaDEL-Descriptor/lib/._l2fprod-common-all(1).jar  \n",
            "  inflating: PaDEL-Descriptor/lib/jama.jar  \n",
            "  inflating: __MACOSX/PaDEL-Descriptor/lib/._jama.jar  \n",
            "  inflating: PaDEL-Descriptor/lib/l2fprod-common-all.jar  \n",
            "  inflating: __MACOSX/PaDEL-Descriptor/lib/._l2fprod-common-all.jar  \n",
            "  inflating: PaDEL-Descriptor/lib/jama(5).jar  \n",
            "  inflating: __MACOSX/PaDEL-Descriptor/lib/._jama(5).jar  \n",
            "  inflating: PaDEL-Descriptor/lib/jgrapht-0.6.0(1).jar  \n",
            "  inflating: __MACOSX/PaDEL-Descriptor/lib/._jgrapht-0.6.0(1).jar  \n",
            "  inflating: PaDEL-Descriptor/lib/commons-cli-1.2(7).jar  \n",
            "  inflating: __MACOSX/PaDEL-Descriptor/lib/._commons-cli-1.2(7).jar  \n",
            "  inflating: PaDEL-Descriptor/lib/libPaDEL-Descriptor.jar  \n",
            "  inflating: __MACOSX/PaDEL-Descriptor/lib/._libPaDEL-Descriptor.jar  \n",
            "  inflating: PaDEL-Descriptor/lib/libPaDEL-Jobs(4).jar  \n",
            "  inflating: __MACOSX/PaDEL-Descriptor/lib/._libPaDEL-Jobs(4).jar  \n",
            "  inflating: PaDEL-Descriptor/lib/cdk-1.4.15.jar  \n",
            "  inflating: __MACOSX/PaDEL-Descriptor/lib/._cdk-1.4.15.jar  \n",
            "  inflating: PaDEL-Descriptor/lib/ambit2-smarts-2.4.7-SNAPSHOT(5).jar  \n",
            "  inflating: __MACOSX/PaDEL-Descriptor/lib/._ambit2-smarts-2.4.7-SNAPSHOT(5).jar  \n",
            "  inflating: PaDEL-Descriptor/lib/ambit2-core-2.4.7-SNAPSHOT(1).jar  \n",
            "  inflating: __MACOSX/PaDEL-Descriptor/lib/._ambit2-core-2.4.7-SNAPSHOT(1).jar  \n",
            "  inflating: PaDEL-Descriptor/lib/libPaDEL-Jobs(8).jar  \n",
            "  inflating: __MACOSX/PaDEL-Descriptor/lib/._libPaDEL-Jobs(8).jar  \n",
            "  inflating: PaDEL-Descriptor/lib/jgrapht-0.6.0(6).jar  \n",
            "  inflating: __MACOSX/PaDEL-Descriptor/lib/._jgrapht-0.6.0(6).jar  \n",
            "  inflating: PaDEL-Descriptor/lib/jama(2).jar  \n",
            "  inflating: __MACOSX/PaDEL-Descriptor/lib/._jama(2).jar  \n",
            "  inflating: PaDEL-Descriptor/lib/jama(3).jar  \n",
            "  inflating: __MACOSX/PaDEL-Descriptor/lib/._jama(3).jar  \n",
            "  inflating: PaDEL-Descriptor/lib/commons-cli-1.2(1).jar  \n",
            "  inflating: __MACOSX/PaDEL-Descriptor/lib/._commons-cli-1.2(1).jar  \n",
            "  inflating: PaDEL-Descriptor/lib/guava-17.0.jar  \n",
            "  inflating: __MACOSX/PaDEL-Descriptor/lib/._guava-17.0.jar  \n",
            "  inflating: PaDEL-Descriptor/lib/ambit2-smarts-2.4.7-SNAPSHOT(4).jar  \n",
            "  inflating: __MACOSX/PaDEL-Descriptor/lib/._ambit2-smarts-2.4.7-SNAPSHOT(4).jar  \n",
            "  inflating: PaDEL-Descriptor/lib/libPaDEL-Jobs(5).jar  \n",
            "  inflating: __MACOSX/PaDEL-Descriptor/lib/._libPaDEL-Jobs(5).jar  \n"
          ]
        }
      ],
      "source": [
        "! unzip padel.zip"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "QmxXXFa4wTNG"
      },
      "source": [
        "## **Load unknown smiles data, called \"example.txt\"**"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "fcBvxkPWKFRV"
      },
      "source": [
        "Download the curated ChEMBL bioactivity data that has been pre-processed from Parts 1 and 2 of this Bioinformatics Project series. Here we will be using the **bioactivity_data_3class_pIC50.csv** file that essentially contain the pIC50 values that we will be using for building a regression model."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 3,
      "metadata": {
        "id": "Fpu5C7HlwV9s"
      },
      "outputs": [],
      "source": [
        "import pandas as pd"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "BB5G4I1Q2VN5"
      },
      "source": [
        "# We will convert a raw txt file with two lines of smiles to a standard dataframe by a csv intermediate"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 25,
      "metadata": {
        "id": "GCcE8J5XwjtB"
      },
      "outputs": [],
      "source": [
        "df3 = pd.read_csv('/content/sars-cov-2-local_bioactivity_data_preprocessed.csv')"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 26,
      "metadata": {
        "colab": {
          "base_uri": "https://localhost:8080/",
          "height": 125
        },
        "id": "60z_N6egNiSJ",
        "outputId": "5865c062-c09e-41b3-c88e-b3108a9b98dc"
      },
      "outputs": [
        {
          "output_type": "execute_result",
          "data": {
            "text/html": [
              "\n",
              "  <div id=\"df-c8d35cd4-c992-40fc-8b56-dfe87abd0c63\">\n",
              "    <div class=\"colab-df-container\">\n",
              "      <div>\n",
              "<style scoped>\n",
              "    .dataframe tbody tr th:only-of-type {\n",
              "        vertical-align: middle;\n",
              "    }\n",
              "\n",
              "    .dataframe tbody tr th {\n",
              "        vertical-align: top;\n",
              "    }\n",
              "\n",
              "    .dataframe thead th {\n",
              "        text-align: right;\n",
              "    }\n",
              "</style>\n",
              "<table border=\"1\" class=\"dataframe\">\n",
              "  <thead>\n",
              "    <tr style=\"text-align: right;\">\n",
              "      <th></th>\n",
              "      <th>Unnamed: 0</th>\n",
              "      <th>smiles</th>\n",
              "      <th>bioactivity</th>\n",
              "      <th>bioactivity_class</th>\n",
              "    </tr>\n",
              "  </thead>\n",
              "  <tbody>\n",
              "    <tr>\n",
              "      <th>0</th>\n",
              "      <td>0</td>\n",
              "      <td>CN(N=C1)C=C1C2=CC=C(OC(F)=C3C4=CN(CC5=CC=CC=C5...</td>\n",
              "      <td>8.5</td>\n",
              "      <td>active</td>\n",
              "    </tr>\n",
              "  </tbody>\n",
              "</table>\n",
              "</div>\n",
              "      <button class=\"colab-df-convert\" onclick=\"convertToInteractive('df-c8d35cd4-c992-40fc-8b56-dfe87abd0c63')\"\n",
              "              title=\"Convert this dataframe to an interactive table.\"\n",
              "              style=\"display:none;\">\n",
              "        \n",
              "  <svg xmlns=\"http://www.w3.org/2000/svg\" height=\"24px\"viewBox=\"0 0 24 24\"\n",
              "       width=\"24px\">\n",
              "    <path d=\"M0 0h24v24H0V0z\" fill=\"none\"/>\n",
              "    <path d=\"M18.56 5.44l.94 2.06.94-2.06 2.06-.94-2.06-.94-.94-2.06-.94 2.06-2.06.94zm-11 1L8.5 8.5l.94-2.06 2.06-.94-2.06-.94L8.5 2.5l-.94 2.06-2.06.94zm10 10l.94 2.06.94-2.06 2.06-.94-2.06-.94-.94-2.06-.94 2.06-2.06.94z\"/><path d=\"M17.41 7.96l-1.37-1.37c-.4-.4-.92-.59-1.43-.59-.52 0-1.04.2-1.43.59L10.3 9.45l-7.72 7.72c-.78.78-.78 2.05 0 2.83L4 21.41c.39.39.9.59 1.41.59.51 0 1.02-.2 1.41-.59l7.78-7.78 2.81-2.81c.8-.78.8-2.07 0-2.86zM5.41 20L4 18.59l7.72-7.72 1.47 1.35L5.41 20z\"/>\n",
              "  </svg>\n",
              "      </button>\n",
              "      \n",
              "  <style>\n",
              "    .colab-df-container {\n",
              "      display:flex;\n",
              "      flex-wrap:wrap;\n",
              "      gap: 12px;\n",
              "    }\n",
              "\n",
              "    .colab-df-convert {\n",
              "      background-color: #E8F0FE;\n",
              "      border: none;\n",
              "      border-radius: 50%;\n",
              "      cursor: pointer;\n",
              "      display: none;\n",
              "      fill: #1967D2;\n",
              "      height: 32px;\n",
              "      padding: 0 0 0 0;\n",
              "      width: 32px;\n",
              "    }\n",
              "\n",
              "    .colab-df-convert:hover {\n",
              "      background-color: #E2EBFA;\n",
              "      box-shadow: 0px 1px 2px rgba(60, 64, 67, 0.3), 0px 1px 3px 1px rgba(60, 64, 67, 0.15);\n",
              "      fill: #174EA6;\n",
              "    }\n",
              "\n",
              "    [theme=dark] .colab-df-convert {\n",
              "      background-color: #3B4455;\n",
              "      fill: #D2E3FC;\n",
              "    }\n",
              "\n",
              "    [theme=dark] .colab-df-convert:hover {\n",
              "      background-color: #434B5C;\n",
              "      box-shadow: 0px 1px 3px 1px rgba(0, 0, 0, 0.15);\n",
              "      filter: drop-shadow(0px 1px 2px rgba(0, 0, 0, 0.3));\n",
              "      fill: #FFFFFF;\n",
              "    }\n",
              "  </style>\n",
              "\n",
              "      <script>\n",
              "        const buttonEl =\n",
              "          document.querySelector('#df-c8d35cd4-c992-40fc-8b56-dfe87abd0c63 button.colab-df-convert');\n",
              "        buttonEl.style.display =\n",
              "          google.colab.kernel.accessAllowed ? 'block' : 'none';\n",
              "\n",
              "        async function convertToInteractive(key) {\n",
              "          const element = document.querySelector('#df-c8d35cd4-c992-40fc-8b56-dfe87abd0c63');\n",
              "          const dataTable =\n",
              "            await google.colab.kernel.invokeFunction('convertToInteractive',\n",
              "                                                     [key], {});\n",
              "          if (!dataTable) return;\n",
              "\n",
              "          const docLinkHtml = 'Like what you see? Visit the ' +\n",
              "            '<a target=\"_blank\" href=https://colab.research.google.com/notebooks/data_table.ipynb>data table notebook</a>'\n",
              "            + ' to learn more about interactive tables.';\n",
              "          element.innerHTML = '';\n",
              "          dataTable['output_type'] = 'display_data';\n",
              "          await google.colab.output.renderOutput(dataTable, element);\n",
              "          const docLink = document.createElement('div');\n",
              "          docLink.innerHTML = docLinkHtml;\n",
              "          element.appendChild(docLink);\n",
              "        }\n",
              "      </script>\n",
              "    </div>\n",
              "  </div>\n",
              "  "
            ],
            "text/plain": [
              "   Unnamed: 0  ... bioactivity_class\n",
              "0           0  ...            active\n",
              "\n",
              "[1 rows x 4 columns]"
            ]
          },
          "metadata": {},
          "execution_count": 26
        }
      ],
      "source": [
        "df3"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "vfl4o7FL28hj"
      },
      "source": [
        "Next the txt file will be save as a csv file, then import back again, to beautify the structure."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 28,
      "metadata": {
        "colab": {
          "base_uri": "https://localhost:8080/",
          "height": 81
        },
        "id": "AhGXxI3s1ocY",
        "outputId": "5994608e-48e7-4c0c-e655-38cf84f60f30"
      },
      "outputs": [
        {
          "output_type": "execute_result",
          "data": {
            "text/html": [
              "\n",
              "  <div id=\"df-9668334d-6ba5-41ba-8819-8cf3e6922732\">\n",
              "    <div class=\"colab-df-container\">\n",
              "      <div>\n",
              "<style scoped>\n",
              "    .dataframe tbody tr th:only-of-type {\n",
              "        vertical-align: middle;\n",
              "    }\n",
              "\n",
              "    .dataframe tbody tr th {\n",
              "        vertical-align: top;\n",
              "    }\n",
              "\n",
              "    .dataframe thead th {\n",
              "        text-align: right;\n",
              "    }\n",
              "</style>\n",
              "<table border=\"1\" class=\"dataframe\">\n",
              "  <thead>\n",
              "    <tr style=\"text-align: right;\">\n",
              "      <th></th>\n",
              "      <th>smiles</th>\n",
              "    </tr>\n",
              "  </thead>\n",
              "  <tbody>\n",
              "    <tr>\n",
              "      <th>0</th>\n",
              "      <td>CN(N=C1)C=C1C2=CC=C(OC(F)=C3C4=CN(CC5=CC=CC=C5...</td>\n",
              "    </tr>\n",
              "  </tbody>\n",
              "</table>\n",
              "</div>\n",
              "      <button class=\"colab-df-convert\" onclick=\"convertToInteractive('df-9668334d-6ba5-41ba-8819-8cf3e6922732')\"\n",
              "              title=\"Convert this dataframe to an interactive table.\"\n",
              "              style=\"display:none;\">\n",
              "        \n",
              "  <svg xmlns=\"http://www.w3.org/2000/svg\" height=\"24px\"viewBox=\"0 0 24 24\"\n",
              "       width=\"24px\">\n",
              "    <path d=\"M0 0h24v24H0V0z\" fill=\"none\"/>\n",
              "    <path d=\"M18.56 5.44l.94 2.06.94-2.06 2.06-.94-2.06-.94-.94-2.06-.94 2.06-2.06.94zm-11 1L8.5 8.5l.94-2.06 2.06-.94-2.06-.94L8.5 2.5l-.94 2.06-2.06.94zm10 10l.94 2.06.94-2.06 2.06-.94-2.06-.94-.94-2.06-.94 2.06-2.06.94z\"/><path d=\"M17.41 7.96l-1.37-1.37c-.4-.4-.92-.59-1.43-.59-.52 0-1.04.2-1.43.59L10.3 9.45l-7.72 7.72c-.78.78-.78 2.05 0 2.83L4 21.41c.39.39.9.59 1.41.59.51 0 1.02-.2 1.41-.59l7.78-7.78 2.81-2.81c.8-.78.8-2.07 0-2.86zM5.41 20L4 18.59l7.72-7.72 1.47 1.35L5.41 20z\"/>\n",
              "  </svg>\n",
              "      </button>\n",
              "      \n",
              "  <style>\n",
              "    .colab-df-container {\n",
              "      display:flex;\n",
              "      flex-wrap:wrap;\n",
              "      gap: 12px;\n",
              "    }\n",
              "\n",
              "    .colab-df-convert {\n",
              "      background-color: #E8F0FE;\n",
              "      border: none;\n",
              "      border-radius: 50%;\n",
              "      cursor: pointer;\n",
              "      display: none;\n",
              "      fill: #1967D2;\n",
              "      height: 32px;\n",
              "      padding: 0 0 0 0;\n",
              "      width: 32px;\n",
              "    }\n",
              "\n",
              "    .colab-df-convert:hover {\n",
              "      background-color: #E2EBFA;\n",
              "      box-shadow: 0px 1px 2px rgba(60, 64, 67, 0.3), 0px 1px 3px 1px rgba(60, 64, 67, 0.15);\n",
              "      fill: #174EA6;\n",
              "    }\n",
              "\n",
              "    [theme=dark] .colab-df-convert {\n",
              "      background-color: #3B4455;\n",
              "      fill: #D2E3FC;\n",
              "    }\n",
              "\n",
              "    [theme=dark] .colab-df-convert:hover {\n",
              "      background-color: #434B5C;\n",
              "      box-shadow: 0px 1px 3px 1px rgba(0, 0, 0, 0.15);\n",
              "      filter: drop-shadow(0px 1px 2px rgba(0, 0, 0, 0.3));\n",
              "      fill: #FFFFFF;\n",
              "    }\n",
              "  </style>\n",
              "\n",
              "      <script>\n",
              "        const buttonEl =\n",
              "          document.querySelector('#df-9668334d-6ba5-41ba-8819-8cf3e6922732 button.colab-df-convert');\n",
              "        buttonEl.style.display =\n",
              "          google.colab.kernel.accessAllowed ? 'block' : 'none';\n",
              "\n",
              "        async function convertToInteractive(key) {\n",
              "          const element = document.querySelector('#df-9668334d-6ba5-41ba-8819-8cf3e6922732');\n",
              "          const dataTable =\n",
              "            await google.colab.kernel.invokeFunction('convertToInteractive',\n",
              "                                                     [key], {});\n",
              "          if (!dataTable) return;\n",
              "\n",
              "          const docLinkHtml = 'Like what you see? Visit the ' +\n",
              "            '<a target=\"_blank\" href=https://colab.research.google.com/notebooks/data_table.ipynb>data table notebook</a>'\n",
              "            + ' to learn more about interactive tables.';\n",
              "          element.innerHTML = '';\n",
              "          dataTable['output_type'] = 'display_data';\n",
              "          await google.colab.output.renderOutput(dataTable, element);\n",
              "          const docLink = document.createElement('div');\n",
              "          docLink.innerHTML = docLinkHtml;\n",
              "          element.appendChild(docLink);\n",
              "        }\n",
              "      </script>\n",
              "    </div>\n",
              "  </div>\n",
              "  "
            ],
            "text/plain": [
              "                                              smiles\n",
              "0  CN(N=C1)C=C1C2=CC=C(OC(F)=C3C4=CN(CC5=CC=CC=C5..."
            ]
          },
          "metadata": {},
          "execution_count": 28
        }
      ],
      "source": [
        "selection=['smiles']\n",
        "df3[selection]"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 29,
      "metadata": {
        "colab": {
          "base_uri": "https://localhost:8080/",
          "height": 81
        },
        "id": "kBPFVwji11OJ",
        "outputId": "a63ea081-4af4-47ad-b3fa-231c1d838bba"
      },
      "outputs": [
        {
          "output_type": "execute_result",
          "data": {
            "text/html": [
              "\n",
              "  <div id=\"df-dfd1e56d-c5e3-4b75-830f-a5047d4dedda\">\n",
              "    <div class=\"colab-df-container\">\n",
              "      <div>\n",
              "<style scoped>\n",
              "    .dataframe tbody tr th:only-of-type {\n",
              "        vertical-align: middle;\n",
              "    }\n",
              "\n",
              "    .dataframe tbody tr th {\n",
              "        vertical-align: top;\n",
              "    }\n",
              "\n",
              "    .dataframe thead th {\n",
              "        text-align: right;\n",
              "    }\n",
              "</style>\n",
              "<table border=\"1\" class=\"dataframe\">\n",
              "  <thead>\n",
              "    <tr style=\"text-align: right;\">\n",
              "      <th></th>\n",
              "      <th>smiles</th>\n",
              "    </tr>\n",
              "  </thead>\n",
              "  <tbody>\n",
              "    <tr>\n",
              "      <th>0</th>\n",
              "      <td>CN(N=C1)C=C1C2=CC=C(OC(F)=C3C4=CN(CC5=CC=CC=C5...</td>\n",
              "    </tr>\n",
              "  </tbody>\n",
              "</table>\n",
              "</div>\n",
              "      <button class=\"colab-df-convert\" onclick=\"convertToInteractive('df-dfd1e56d-c5e3-4b75-830f-a5047d4dedda')\"\n",
              "              title=\"Convert this dataframe to an interactive table.\"\n",
              "              style=\"display:none;\">\n",
              "        \n",
              "  <svg xmlns=\"http://www.w3.org/2000/svg\" height=\"24px\"viewBox=\"0 0 24 24\"\n",
              "       width=\"24px\">\n",
              "    <path d=\"M0 0h24v24H0V0z\" fill=\"none\"/>\n",
              "    <path d=\"M18.56 5.44l.94 2.06.94-2.06 2.06-.94-2.06-.94-.94-2.06-.94 2.06-2.06.94zm-11 1L8.5 8.5l.94-2.06 2.06-.94-2.06-.94L8.5 2.5l-.94 2.06-2.06.94zm10 10l.94 2.06.94-2.06 2.06-.94-2.06-.94-.94-2.06-.94 2.06-2.06.94z\"/><path d=\"M17.41 7.96l-1.37-1.37c-.4-.4-.92-.59-1.43-.59-.52 0-1.04.2-1.43.59L10.3 9.45l-7.72 7.72c-.78.78-.78 2.05 0 2.83L4 21.41c.39.39.9.59 1.41.59.51 0 1.02-.2 1.41-.59l7.78-7.78 2.81-2.81c.8-.78.8-2.07 0-2.86zM5.41 20L4 18.59l7.72-7.72 1.47 1.35L5.41 20z\"/>\n",
              "  </svg>\n",
              "      </button>\n",
              "      \n",
              "  <style>\n",
              "    .colab-df-container {\n",
              "      display:flex;\n",
              "      flex-wrap:wrap;\n",
              "      gap: 12px;\n",
              "    }\n",
              "\n",
              "    .colab-df-convert {\n",
              "      background-color: #E8F0FE;\n",
              "      border: none;\n",
              "      border-radius: 50%;\n",
              "      cursor: pointer;\n",
              "      display: none;\n",
              "      fill: #1967D2;\n",
              "      height: 32px;\n",
              "      padding: 0 0 0 0;\n",
              "      width: 32px;\n",
              "    }\n",
              "\n",
              "    .colab-df-convert:hover {\n",
              "      background-color: #E2EBFA;\n",
              "      box-shadow: 0px 1px 2px rgba(60, 64, 67, 0.3), 0px 1px 3px 1px rgba(60, 64, 67, 0.15);\n",
              "      fill: #174EA6;\n",
              "    }\n",
              "\n",
              "    [theme=dark] .colab-df-convert {\n",
              "      background-color: #3B4455;\n",
              "      fill: #D2E3FC;\n",
              "    }\n",
              "\n",
              "    [theme=dark] .colab-df-convert:hover {\n",
              "      background-color: #434B5C;\n",
              "      box-shadow: 0px 1px 3px 1px rgba(0, 0, 0, 0.15);\n",
              "      filter: drop-shadow(0px 1px 2px rgba(0, 0, 0, 0.3));\n",
              "      fill: #FFFFFF;\n",
              "    }\n",
              "  </style>\n",
              "\n",
              "      <script>\n",
              "        const buttonEl =\n",
              "          document.querySelector('#df-dfd1e56d-c5e3-4b75-830f-a5047d4dedda button.colab-df-convert');\n",
              "        buttonEl.style.display =\n",
              "          google.colab.kernel.accessAllowed ? 'block' : 'none';\n",
              "\n",
              "        async function convertToInteractive(key) {\n",
              "          const element = document.querySelector('#df-dfd1e56d-c5e3-4b75-830f-a5047d4dedda');\n",
              "          const dataTable =\n",
              "            await google.colab.kernel.invokeFunction('convertToInteractive',\n",
              "                                                     [key], {});\n",
              "          if (!dataTable) return;\n",
              "\n",
              "          const docLinkHtml = 'Like what you see? Visit the ' +\n",
              "            '<a target=\"_blank\" href=https://colab.research.google.com/notebooks/data_table.ipynb>data table notebook</a>'\n",
              "            + ' to learn more about interactive tables.';\n",
              "          element.innerHTML = '';\n",
              "          dataTable['output_type'] = 'display_data';\n",
              "          await google.colab.output.renderOutput(dataTable, element);\n",
              "          const docLink = document.createElement('div');\n",
              "          docLink.innerHTML = docLinkHtml;\n",
              "          element.appendChild(docLink);\n",
              "        }\n",
              "      </script>\n",
              "    </div>\n",
              "  </div>\n",
              "  "
            ],
            "text/plain": [
              "                                              smiles\n",
              "0  CN(N=C1)C=C1C2=CC=C(OC(F)=C3C4=CN(CC5=CC=CC=C5..."
            ]
          },
          "metadata": {},
          "execution_count": 29
        }
      ],
      "source": [
        "df4_selection = df3[selection]\n",
        "df4_selection"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 30,
      "metadata": {
        "id": "cly4Gagw2A2S"
      },
      "outputs": [],
      "source": [
        "df4_selection.to_csv('molecule.smi', sep= '\\t', index=False, header=False)"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 31,
      "metadata": {
        "colab": {
          "base_uri": "https://localhost:8080/"
        },
        "id": "nRSCoPVDSkf5",
        "outputId": "cbb32339-ebb8-4f90-e0ca-9ee808ac70bd"
      },
      "outputs": [
        {
          "output_type": "stream",
          "name": "stdout",
          "text": [
            "CN(N=C1)C=C1C2=CC=C(OC(F)=C3C4=CN(CC5=CC=CC=C5)N=C4)C3=N2\n"
          ]
        }
      ],
      "source": [
        "! cat molecule.smi | head -5"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 32,
      "metadata": {
        "colab": {
          "base_uri": "https://localhost:8080/"
        },
        "id": "GlYaJ9pzUGjS",
        "outputId": "9fd2c672-cbb3-4321-bede-1cbfaa801989"
      },
      "outputs": [
        {
          "output_type": "stream",
          "name": "stdout",
          "text": [
            "1\n"
          ]
        }
      ],
      "source": [
        "! cat molecule.smi | wc -l"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "YzN_S4Quro5S"
      },
      "source": [
        "## **Calculate fingerprint descriptors, for the unknown smiles**\n"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "JsgTV-ByxdMa"
      },
      "source": [
        "### **Calculate PaDEL descriptors**"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 33,
      "metadata": {
        "colab": {
          "base_uri": "https://localhost:8080/"
        },
        "id": "hSCopQvEiSMj",
        "outputId": "2dba1700-c41a-482f-b23f-c12e3a31c8d1"
      },
      "outputs": [
        {
          "output_type": "stream",
          "name": "stdout",
          "text": [
            "java -Xms1G -Xmx1G -Djava.awt.headless=true -jar ./PaDEL-Descriptor/PaDEL-Descriptor.jar -removesalt -standardizenitro -fingerprints -descriptortypes ./PaDEL-Descriptor/PubchemFingerprinter.xml -dir ./ -file descriptors_output.csv\n"
          ]
        }
      ],
      "source": [
        "! cat padel.sh"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 34,
      "metadata": {
        "colab": {
          "base_uri": "https://localhost:8080/"
        },
        "id": "6kN9jrGpS5nE",
        "outputId": "9aded54a-1617-426d-bd72-88a8f4299c28"
      },
      "outputs": [
        {
          "output_type": "stream",
          "name": "stdout",
          "text": [
            "Processing AUTOGEN_molecule in molecule.smi (1/1). \n",
            "Descriptor calculation completed in 1.869 secs . Average speed: 1.87 s/mol.\n"
          ]
        }
      ],
      "source": [
        "! bash padel.sh"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 35,
      "metadata": {
        "colab": {
          "base_uri": "https://localhost:8080/"
        },
        "id": "2p7rAVy_k_hH",
        "outputId": "7feca88e-ad52-4705-a89f-2030b25482eb"
      },
      "outputs": [
        {
          "output_type": "stream",
          "name": "stdout",
          "text": [
            "total 25736\n",
            "-rw-r--r-- 1 root root    13129 Jan  9 12:25 descriptors_output.csv\n",
            "drwxr-xr-x 3 root root     4096 Jan  9 12:11 __MACOSX\n",
            "-rw-r--r-- 1 root root       58 Jan  9 12:25 molecule.smi\n",
            "drwxrwxr-x 4 root root     4096 May 30  2020 PaDEL-Descriptor\n",
            "-rw-r--r-- 1 root root      231 Jan  9 12:11 padel.sh\n",
            "-rw-r--r-- 1 root root 25768637 Jan  9 12:11 padel.zip\n",
            "drwxr-xr-x 1 root root     4096 Dec 23 14:32 sample_data\n",
            "-rw-r--r-- 1 root root   258771 Jan  9 12:23 sars-cov-2_bioactivity_data_2class_pIC50_pubchem_fp.csv\n",
            "-rw-r--r-- 1 root root   258771 Jan  9 12:22 sars-cov-2_bioactivity_data_3class_pIC50_pubchem_fp.csv\n",
            "-rw-r--r-- 1 root root    14767 Jan  9 12:11 sars-cov-2_bioactivity_data_processed_2classes_pIC50.csv\n",
            "-rw-r--r-- 1 root root      109 Jan  9 12:24 sars-cov-2-local_bioactivity_data_preprocessed.csv\n"
          ]
        }
      ],
      "source": [
        "! ls -l"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "gUMlPfFrxicj"
      },
      "source": [
        "## **Preparing the X matrices**"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "30aa4WP4ZA8M"
      },
      "source": [
        "### **X data matrix**"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 36,
      "metadata": {
        "id": "3g319qxVl7tY"
      },
      "outputs": [],
      "source": [
        "df3_X = pd.read_csv('descriptors_output.csv')"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 37,
      "metadata": {
        "colab": {
          "base_uri": "https://localhost:8080/",
          "height": 174
        },
        "id": "hBp1PTObFQDd",
        "outputId": "fc92b936-8d69-4240-9d48-0790d5a538c7"
      },
      "outputs": [
        {
          "output_type": "execute_result",
          "data": {
            "text/html": [
              "\n",
              "  <div id=\"df-27bb86b6-b08b-4548-8ef1-87f0899249f9\">\n",
              "    <div class=\"colab-df-container\">\n",
              "      <div>\n",
              "<style scoped>\n",
              "    .dataframe tbody tr th:only-of-type {\n",
              "        vertical-align: middle;\n",
              "    }\n",
              "\n",
              "    .dataframe tbody tr th {\n",
              "        vertical-align: top;\n",
              "    }\n",
              "\n",
              "    .dataframe thead th {\n",
              "        text-align: right;\n",
              "    }\n",
              "</style>\n",
              "<table border=\"1\" class=\"dataframe\">\n",
              "  <thead>\n",
              "    <tr style=\"text-align: right;\">\n",
              "      <th></th>\n",
              "      <th>Name</th>\n",
              "      <th>PubchemFP0</th>\n",
              "      <th>PubchemFP1</th>\n",
              "      <th>PubchemFP2</th>\n",
              "      <th>PubchemFP3</th>\n",
              "      <th>PubchemFP4</th>\n",
              "      <th>PubchemFP5</th>\n",
              "      <th>PubchemFP6</th>\n",
              "      <th>PubchemFP7</th>\n",
              "      <th>PubchemFP8</th>\n",
              "      <th>PubchemFP9</th>\n",
              "      <th>PubchemFP10</th>\n",
              "      <th>PubchemFP11</th>\n",
              "      <th>PubchemFP12</th>\n",
              "      <th>PubchemFP13</th>\n",
              "      <th>PubchemFP14</th>\n",
              "      <th>PubchemFP15</th>\n",
              "      <th>PubchemFP16</th>\n",
              "      <th>PubchemFP17</th>\n",
              "      <th>PubchemFP18</th>\n",
              "      <th>PubchemFP19</th>\n",
              "      <th>PubchemFP20</th>\n",
              "      <th>PubchemFP21</th>\n",
              "      <th>PubchemFP22</th>\n",
              "      <th>PubchemFP23</th>\n",
              "      <th>PubchemFP24</th>\n",
              "      <th>PubchemFP25</th>\n",
              "      <th>PubchemFP26</th>\n",
              "      <th>PubchemFP27</th>\n",
              "      <th>PubchemFP28</th>\n",
              "      <th>PubchemFP29</th>\n",
              "      <th>PubchemFP30</th>\n",
              "      <th>PubchemFP31</th>\n",
              "      <th>PubchemFP32</th>\n",
              "      <th>PubchemFP33</th>\n",
              "      <th>PubchemFP34</th>\n",
              "      <th>PubchemFP35</th>\n",
              "      <th>PubchemFP36</th>\n",
              "      <th>PubchemFP37</th>\n",
              "      <th>PubchemFP38</th>\n",
              "      <th>...</th>\n",
              "      <th>PubchemFP841</th>\n",
              "      <th>PubchemFP842</th>\n",
              "      <th>PubchemFP843</th>\n",
              "      <th>PubchemFP844</th>\n",
              "      <th>PubchemFP845</th>\n",
              "      <th>PubchemFP846</th>\n",
              "      <th>PubchemFP847</th>\n",
              "      <th>PubchemFP848</th>\n",
              "      <th>PubchemFP849</th>\n",
              "      <th>PubchemFP850</th>\n",
              "      <th>PubchemFP851</th>\n",
              "      <th>PubchemFP852</th>\n",
              "      <th>PubchemFP853</th>\n",
              "      <th>PubchemFP854</th>\n",
              "      <th>PubchemFP855</th>\n",
              "      <th>PubchemFP856</th>\n",
              "      <th>PubchemFP857</th>\n",
              "      <th>PubchemFP858</th>\n",
              "      <th>PubchemFP859</th>\n",
              "      <th>PubchemFP860</th>\n",
              "      <th>PubchemFP861</th>\n",
              "      <th>PubchemFP862</th>\n",
              "      <th>PubchemFP863</th>\n",
              "      <th>PubchemFP864</th>\n",
              "      <th>PubchemFP865</th>\n",
              "      <th>PubchemFP866</th>\n",
              "      <th>PubchemFP867</th>\n",
              "      <th>PubchemFP868</th>\n",
              "      <th>PubchemFP869</th>\n",
              "      <th>PubchemFP870</th>\n",
              "      <th>PubchemFP871</th>\n",
              "      <th>PubchemFP872</th>\n",
              "      <th>PubchemFP873</th>\n",
              "      <th>PubchemFP874</th>\n",
              "      <th>PubchemFP875</th>\n",
              "      <th>PubchemFP876</th>\n",
              "      <th>PubchemFP877</th>\n",
              "      <th>PubchemFP878</th>\n",
              "      <th>PubchemFP879</th>\n",
              "      <th>PubchemFP880</th>\n",
              "    </tr>\n",
              "  </thead>\n",
              "  <tbody>\n",
              "    <tr>\n",
              "      <th>0</th>\n",
              "      <td>AUTOGEN_molecule</td>\n",
              "      <td>1</td>\n",
              "      <td>1</td>\n",
              "      <td>1</td>\n",
              "      <td>0</td>\n",
              "      <td>0</td>\n",
              "      <td>0</td>\n",
              "      <td>0</td>\n",
              "      <td>0</td>\n",
              "      <td>0</td>\n",
              "      <td>1</td>\n",
              "      <td>1</td>\n",
              "      <td>1</td>\n",
              "      <td>1</td>\n",
              "      <td>0</td>\n",
              "      <td>1</td>\n",
              "      <td>1</td>\n",
              "      <td>1</td>\n",
              "      <td>0</td>\n",
              "      <td>1</td>\n",
              "      <td>0</td>\n",
              "      <td>0</td>\n",
              "      <td>0</td>\n",
              "      <td>0</td>\n",
              "      <td>1</td>\n",
              "      <td>0</td>\n",
              "      <td>0</td>\n",
              "      <td>0</td>\n",
              "      <td>0</td>\n",
              "      <td>0</td>\n",
              "      <td>0</td>\n",
              "      <td>0</td>\n",
              "      <td>0</td>\n",
              "      <td>0</td>\n",
              "      <td>0</td>\n",
              "      <td>0</td>\n",
              "      <td>0</td>\n",
              "      <td>0</td>\n",
              "      <td>0</td>\n",
              "      <td>0</td>\n",
              "      <td>...</td>\n",
              "      <td>0</td>\n",
              "      <td>0</td>\n",
              "      <td>0</td>\n",
              "      <td>0</td>\n",
              "      <td>0</td>\n",
              "      <td>0</td>\n",
              "      <td>0</td>\n",
              "      <td>0</td>\n",
              "      <td>0</td>\n",
              "      <td>0</td>\n",
              "      <td>0</td>\n",
              "      <td>0</td>\n",
              "      <td>0</td>\n",
              "      <td>0</td>\n",
              "      <td>0</td>\n",
              "      <td>0</td>\n",
              "      <td>0</td>\n",
              "      <td>0</td>\n",
              "      <td>0</td>\n",
              "      <td>0</td>\n",
              "      <td>0</td>\n",
              "      <td>0</td>\n",
              "      <td>0</td>\n",
              "      <td>0</td>\n",
              "      <td>0</td>\n",
              "      <td>0</td>\n",
              "      <td>0</td>\n",
              "      <td>0</td>\n",
              "      <td>0</td>\n",
              "      <td>0</td>\n",
              "      <td>0</td>\n",
              "      <td>0</td>\n",
              "      <td>0</td>\n",
              "      <td>0</td>\n",
              "      <td>0</td>\n",
              "      <td>0</td>\n",
              "      <td>0</td>\n",
              "      <td>0</td>\n",
              "      <td>0</td>\n",
              "      <td>0</td>\n",
              "    </tr>\n",
              "  </tbody>\n",
              "</table>\n",
              "<p>1 rows × 882 columns</p>\n",
              "</div>\n",
              "      <button class=\"colab-df-convert\" onclick=\"convertToInteractive('df-27bb86b6-b08b-4548-8ef1-87f0899249f9')\"\n",
              "              title=\"Convert this dataframe to an interactive table.\"\n",
              "              style=\"display:none;\">\n",
              "        \n",
              "  <svg xmlns=\"http://www.w3.org/2000/svg\" height=\"24px\"viewBox=\"0 0 24 24\"\n",
              "       width=\"24px\">\n",
              "    <path d=\"M0 0h24v24H0V0z\" fill=\"none\"/>\n",
              "    <path d=\"M18.56 5.44l.94 2.06.94-2.06 2.06-.94-2.06-.94-.94-2.06-.94 2.06-2.06.94zm-11 1L8.5 8.5l.94-2.06 2.06-.94-2.06-.94L8.5 2.5l-.94 2.06-2.06.94zm10 10l.94 2.06.94-2.06 2.06-.94-2.06-.94-.94-2.06-.94 2.06-2.06.94z\"/><path d=\"M17.41 7.96l-1.37-1.37c-.4-.4-.92-.59-1.43-.59-.52 0-1.04.2-1.43.59L10.3 9.45l-7.72 7.72c-.78.78-.78 2.05 0 2.83L4 21.41c.39.39.9.59 1.41.59.51 0 1.02-.2 1.41-.59l7.78-7.78 2.81-2.81c.8-.78.8-2.07 0-2.86zM5.41 20L4 18.59l7.72-7.72 1.47 1.35L5.41 20z\"/>\n",
              "  </svg>\n",
              "      </button>\n",
              "      \n",
              "  <style>\n",
              "    .colab-df-container {\n",
              "      display:flex;\n",
              "      flex-wrap:wrap;\n",
              "      gap: 12px;\n",
              "    }\n",
              "\n",
              "    .colab-df-convert {\n",
              "      background-color: #E8F0FE;\n",
              "      border: none;\n",
              "      border-radius: 50%;\n",
              "      cursor: pointer;\n",
              "      display: none;\n",
              "      fill: #1967D2;\n",
              "      height: 32px;\n",
              "      padding: 0 0 0 0;\n",
              "      width: 32px;\n",
              "    }\n",
              "\n",
              "    .colab-df-convert:hover {\n",
              "      background-color: #E2EBFA;\n",
              "      box-shadow: 0px 1px 2px rgba(60, 64, 67, 0.3), 0px 1px 3px 1px rgba(60, 64, 67, 0.15);\n",
              "      fill: #174EA6;\n",
              "    }\n",
              "\n",
              "    [theme=dark] .colab-df-convert {\n",
              "      background-color: #3B4455;\n",
              "      fill: #D2E3FC;\n",
              "    }\n",
              "\n",
              "    [theme=dark] .colab-df-convert:hover {\n",
              "      background-color: #434B5C;\n",
              "      box-shadow: 0px 1px 3px 1px rgba(0, 0, 0, 0.15);\n",
              "      filter: drop-shadow(0px 1px 2px rgba(0, 0, 0, 0.3));\n",
              "      fill: #FFFFFF;\n",
              "    }\n",
              "  </style>\n",
              "\n",
              "      <script>\n",
              "        const buttonEl =\n",
              "          document.querySelector('#df-27bb86b6-b08b-4548-8ef1-87f0899249f9 button.colab-df-convert');\n",
              "        buttonEl.style.display =\n",
              "          google.colab.kernel.accessAllowed ? 'block' : 'none';\n",
              "\n",
              "        async function convertToInteractive(key) {\n",
              "          const element = document.querySelector('#df-27bb86b6-b08b-4548-8ef1-87f0899249f9');\n",
              "          const dataTable =\n",
              "            await google.colab.kernel.invokeFunction('convertToInteractive',\n",
              "                                                     [key], {});\n",
              "          if (!dataTable) return;\n",
              "\n",
              "          const docLinkHtml = 'Like what you see? Visit the ' +\n",
              "            '<a target=\"_blank\" href=https://colab.research.google.com/notebooks/data_table.ipynb>data table notebook</a>'\n",
              "            + ' to learn more about interactive tables.';\n",
              "          element.innerHTML = '';\n",
              "          dataTable['output_type'] = 'display_data';\n",
              "          await google.colab.output.renderOutput(dataTable, element);\n",
              "          const docLink = document.createElement('div');\n",
              "          docLink.innerHTML = docLinkHtml;\n",
              "          element.appendChild(docLink);\n",
              "        }\n",
              "      </script>\n",
              "    </div>\n",
              "  </div>\n",
              "  "
            ],
            "text/plain": [
              "               Name  PubchemFP0  ...  PubchemFP879  PubchemFP880\n",
              "0  AUTOGEN_molecule           1  ...             0             0\n",
              "\n",
              "[1 rows x 882 columns]"
            ]
          },
          "metadata": {},
          "execution_count": 37
        }
      ],
      "source": [
        "df3_X"
      ]
    },
    {
      "cell_type": "code",
      "source": [
        "df3_X = df3_X.drop(columns=['Name'])\n",
        "df3_X"
      ],
      "metadata": {
        "colab": {
          "base_uri": "https://localhost:8080/",
          "height": 174
        },
        "id": "QX2UdupmxpD-",
        "outputId": "a43a3382-8237-44bd-e3af-a40e4d906cab"
      },
      "execution_count": 38,
      "outputs": [
        {
          "output_type": "execute_result",
          "data": {
            "text/html": [
              "\n",
              "  <div id=\"df-48ce1cc4-4913-4758-ba51-2efa1441c43d\">\n",
              "    <div class=\"colab-df-container\">\n",
              "      <div>\n",
              "<style scoped>\n",
              "    .dataframe tbody tr th:only-of-type {\n",
              "        vertical-align: middle;\n",
              "    }\n",
              "\n",
              "    .dataframe tbody tr th {\n",
              "        vertical-align: top;\n",
              "    }\n",
              "\n",
              "    .dataframe thead th {\n",
              "        text-align: right;\n",
              "    }\n",
              "</style>\n",
              "<table border=\"1\" class=\"dataframe\">\n",
              "  <thead>\n",
              "    <tr style=\"text-align: right;\">\n",
              "      <th></th>\n",
              "      <th>PubchemFP0</th>\n",
              "      <th>PubchemFP1</th>\n",
              "      <th>PubchemFP2</th>\n",
              "      <th>PubchemFP3</th>\n",
              "      <th>PubchemFP4</th>\n",
              "      <th>PubchemFP5</th>\n",
              "      <th>PubchemFP6</th>\n",
              "      <th>PubchemFP7</th>\n",
              "      <th>PubchemFP8</th>\n",
              "      <th>PubchemFP9</th>\n",
              "      <th>PubchemFP10</th>\n",
              "      <th>PubchemFP11</th>\n",
              "      <th>PubchemFP12</th>\n",
              "      <th>PubchemFP13</th>\n",
              "      <th>PubchemFP14</th>\n",
              "      <th>PubchemFP15</th>\n",
              "      <th>PubchemFP16</th>\n",
              "      <th>PubchemFP17</th>\n",
              "      <th>PubchemFP18</th>\n",
              "      <th>PubchemFP19</th>\n",
              "      <th>PubchemFP20</th>\n",
              "      <th>PubchemFP21</th>\n",
              "      <th>PubchemFP22</th>\n",
              "      <th>PubchemFP23</th>\n",
              "      <th>PubchemFP24</th>\n",
              "      <th>PubchemFP25</th>\n",
              "      <th>PubchemFP26</th>\n",
              "      <th>PubchemFP27</th>\n",
              "      <th>PubchemFP28</th>\n",
              "      <th>PubchemFP29</th>\n",
              "      <th>PubchemFP30</th>\n",
              "      <th>PubchemFP31</th>\n",
              "      <th>PubchemFP32</th>\n",
              "      <th>PubchemFP33</th>\n",
              "      <th>PubchemFP34</th>\n",
              "      <th>PubchemFP35</th>\n",
              "      <th>PubchemFP36</th>\n",
              "      <th>PubchemFP37</th>\n",
              "      <th>PubchemFP38</th>\n",
              "      <th>PubchemFP39</th>\n",
              "      <th>...</th>\n",
              "      <th>PubchemFP841</th>\n",
              "      <th>PubchemFP842</th>\n",
              "      <th>PubchemFP843</th>\n",
              "      <th>PubchemFP844</th>\n",
              "      <th>PubchemFP845</th>\n",
              "      <th>PubchemFP846</th>\n",
              "      <th>PubchemFP847</th>\n",
              "      <th>PubchemFP848</th>\n",
              "      <th>PubchemFP849</th>\n",
              "      <th>PubchemFP850</th>\n",
              "      <th>PubchemFP851</th>\n",
              "      <th>PubchemFP852</th>\n",
              "      <th>PubchemFP853</th>\n",
              "      <th>PubchemFP854</th>\n",
              "      <th>PubchemFP855</th>\n",
              "      <th>PubchemFP856</th>\n",
              "      <th>PubchemFP857</th>\n",
              "      <th>PubchemFP858</th>\n",
              "      <th>PubchemFP859</th>\n",
              "      <th>PubchemFP860</th>\n",
              "      <th>PubchemFP861</th>\n",
              "      <th>PubchemFP862</th>\n",
              "      <th>PubchemFP863</th>\n",
              "      <th>PubchemFP864</th>\n",
              "      <th>PubchemFP865</th>\n",
              "      <th>PubchemFP866</th>\n",
              "      <th>PubchemFP867</th>\n",
              "      <th>PubchemFP868</th>\n",
              "      <th>PubchemFP869</th>\n",
              "      <th>PubchemFP870</th>\n",
              "      <th>PubchemFP871</th>\n",
              "      <th>PubchemFP872</th>\n",
              "      <th>PubchemFP873</th>\n",
              "      <th>PubchemFP874</th>\n",
              "      <th>PubchemFP875</th>\n",
              "      <th>PubchemFP876</th>\n",
              "      <th>PubchemFP877</th>\n",
              "      <th>PubchemFP878</th>\n",
              "      <th>PubchemFP879</th>\n",
              "      <th>PubchemFP880</th>\n",
              "    </tr>\n",
              "  </thead>\n",
              "  <tbody>\n",
              "    <tr>\n",
              "      <th>0</th>\n",
              "      <td>1</td>\n",
              "      <td>1</td>\n",
              "      <td>1</td>\n",
              "      <td>0</td>\n",
              "      <td>0</td>\n",
              "      <td>0</td>\n",
              "      <td>0</td>\n",
              "      <td>0</td>\n",
              "      <td>0</td>\n",
              "      <td>1</td>\n",
              "      <td>1</td>\n",
              "      <td>1</td>\n",
              "      <td>1</td>\n",
              "      <td>0</td>\n",
              "      <td>1</td>\n",
              "      <td>1</td>\n",
              "      <td>1</td>\n",
              "      <td>0</td>\n",
              "      <td>1</td>\n",
              "      <td>0</td>\n",
              "      <td>0</td>\n",
              "      <td>0</td>\n",
              "      <td>0</td>\n",
              "      <td>1</td>\n",
              "      <td>0</td>\n",
              "      <td>0</td>\n",
              "      <td>0</td>\n",
              "      <td>0</td>\n",
              "      <td>0</td>\n",
              "      <td>0</td>\n",
              "      <td>0</td>\n",
              "      <td>0</td>\n",
              "      <td>0</td>\n",
              "      <td>0</td>\n",
              "      <td>0</td>\n",
              "      <td>0</td>\n",
              "      <td>0</td>\n",
              "      <td>0</td>\n",
              "      <td>0</td>\n",
              "      <td>0</td>\n",
              "      <td>...</td>\n",
              "      <td>0</td>\n",
              "      <td>0</td>\n",
              "      <td>0</td>\n",
              "      <td>0</td>\n",
              "      <td>0</td>\n",
              "      <td>0</td>\n",
              "      <td>0</td>\n",
              "      <td>0</td>\n",
              "      <td>0</td>\n",
              "      <td>0</td>\n",
              "      <td>0</td>\n",
              "      <td>0</td>\n",
              "      <td>0</td>\n",
              "      <td>0</td>\n",
              "      <td>0</td>\n",
              "      <td>0</td>\n",
              "      <td>0</td>\n",
              "      <td>0</td>\n",
              "      <td>0</td>\n",
              "      <td>0</td>\n",
              "      <td>0</td>\n",
              "      <td>0</td>\n",
              "      <td>0</td>\n",
              "      <td>0</td>\n",
              "      <td>0</td>\n",
              "      <td>0</td>\n",
              "      <td>0</td>\n",
              "      <td>0</td>\n",
              "      <td>0</td>\n",
              "      <td>0</td>\n",
              "      <td>0</td>\n",
              "      <td>0</td>\n",
              "      <td>0</td>\n",
              "      <td>0</td>\n",
              "      <td>0</td>\n",
              "      <td>0</td>\n",
              "      <td>0</td>\n",
              "      <td>0</td>\n",
              "      <td>0</td>\n",
              "      <td>0</td>\n",
              "    </tr>\n",
              "  </tbody>\n",
              "</table>\n",
              "<p>1 rows × 881 columns</p>\n",
              "</div>\n",
              "      <button class=\"colab-df-convert\" onclick=\"convertToInteractive('df-48ce1cc4-4913-4758-ba51-2efa1441c43d')\"\n",
              "              title=\"Convert this dataframe to an interactive table.\"\n",
              "              style=\"display:none;\">\n",
              "        \n",
              "  <svg xmlns=\"http://www.w3.org/2000/svg\" height=\"24px\"viewBox=\"0 0 24 24\"\n",
              "       width=\"24px\">\n",
              "    <path d=\"M0 0h24v24H0V0z\" fill=\"none\"/>\n",
              "    <path d=\"M18.56 5.44l.94 2.06.94-2.06 2.06-.94-2.06-.94-.94-2.06-.94 2.06-2.06.94zm-11 1L8.5 8.5l.94-2.06 2.06-.94-2.06-.94L8.5 2.5l-.94 2.06-2.06.94zm10 10l.94 2.06.94-2.06 2.06-.94-2.06-.94-.94-2.06-.94 2.06-2.06.94z\"/><path d=\"M17.41 7.96l-1.37-1.37c-.4-.4-.92-.59-1.43-.59-.52 0-1.04.2-1.43.59L10.3 9.45l-7.72 7.72c-.78.78-.78 2.05 0 2.83L4 21.41c.39.39.9.59 1.41.59.51 0 1.02-.2 1.41-.59l7.78-7.78 2.81-2.81c.8-.78.8-2.07 0-2.86zM5.41 20L4 18.59l7.72-7.72 1.47 1.35L5.41 20z\"/>\n",
              "  </svg>\n",
              "      </button>\n",
              "      \n",
              "  <style>\n",
              "    .colab-df-container {\n",
              "      display:flex;\n",
              "      flex-wrap:wrap;\n",
              "      gap: 12px;\n",
              "    }\n",
              "\n",
              "    .colab-df-convert {\n",
              "      background-color: #E8F0FE;\n",
              "      border: none;\n",
              "      border-radius: 50%;\n",
              "      cursor: pointer;\n",
              "      display: none;\n",
              "      fill: #1967D2;\n",
              "      height: 32px;\n",
              "      padding: 0 0 0 0;\n",
              "      width: 32px;\n",
              "    }\n",
              "\n",
              "    .colab-df-convert:hover {\n",
              "      background-color: #E2EBFA;\n",
              "      box-shadow: 0px 1px 2px rgba(60, 64, 67, 0.3), 0px 1px 3px 1px rgba(60, 64, 67, 0.15);\n",
              "      fill: #174EA6;\n",
              "    }\n",
              "\n",
              "    [theme=dark] .colab-df-convert {\n",
              "      background-color: #3B4455;\n",
              "      fill: #D2E3FC;\n",
              "    }\n",
              "\n",
              "    [theme=dark] .colab-df-convert:hover {\n",
              "      background-color: #434B5C;\n",
              "      box-shadow: 0px 1px 3px 1px rgba(0, 0, 0, 0.15);\n",
              "      filter: drop-shadow(0px 1px 2px rgba(0, 0, 0, 0.3));\n",
              "      fill: #FFFFFF;\n",
              "    }\n",
              "  </style>\n",
              "\n",
              "      <script>\n",
              "        const buttonEl =\n",
              "          document.querySelector('#df-48ce1cc4-4913-4758-ba51-2efa1441c43d button.colab-df-convert');\n",
              "        buttonEl.style.display =\n",
              "          google.colab.kernel.accessAllowed ? 'block' : 'none';\n",
              "\n",
              "        async function convertToInteractive(key) {\n",
              "          const element = document.querySelector('#df-48ce1cc4-4913-4758-ba51-2efa1441c43d');\n",
              "          const dataTable =\n",
              "            await google.colab.kernel.invokeFunction('convertToInteractive',\n",
              "                                                     [key], {});\n",
              "          if (!dataTable) return;\n",
              "\n",
              "          const docLinkHtml = 'Like what you see? Visit the ' +\n",
              "            '<a target=\"_blank\" href=https://colab.research.google.com/notebooks/data_table.ipynb>data table notebook</a>'\n",
              "            + ' to learn more about interactive tables.';\n",
              "          element.innerHTML = '';\n",
              "          dataTable['output_type'] = 'display_data';\n",
              "          await google.colab.output.renderOutput(dataTable, element);\n",
              "          const docLink = document.createElement('div');\n",
              "          docLink.innerHTML = docLinkHtml;\n",
              "          element.appendChild(docLink);\n",
              "        }\n",
              "      </script>\n",
              "    </div>\n",
              "  </div>\n",
              "  "
            ],
            "text/plain": [
              "   PubchemFP0  PubchemFP1  PubchemFP2  ...  PubchemFP878  PubchemFP879  PubchemFP880\n",
              "0           1           1           1  ...             0             0             0\n",
              "\n",
              "[1 rows x 881 columns]"
            ]
          },
          "metadata": {},
          "execution_count": 38
        }
      ]
    },
    {
      "cell_type": "markdown",
      "source": [
        " # Y variable"
      ],
      "metadata": {
        "id": "TleL3rRUx3fG"
      }
    },
    {
      "cell_type": "code",
      "source": [
        "df3_Y = df3['bioactivity']\n",
        "df3_Y"
      ],
      "metadata": {
        "colab": {
          "base_uri": "https://localhost:8080/"
        },
        "id": "o9VYwQadxxn1",
        "outputId": "8baf1006-bc75-45e6-8b2f-4d91c4f59136"
      },
      "execution_count": 40,
      "outputs": [
        {
          "output_type": "execute_result",
          "data": {
            "text/plain": [
              "0    8.5\n",
              "Name: bioactivity, dtype: float64"
            ]
          },
          "metadata": {},
          "execution_count": 40
        }
      ]
    },
    {
      "cell_type": "markdown",
      "source": [
        "# Combine X and Y"
      ],
      "metadata": {
        "id": "X2Vdez08yMb8"
      }
    },
    {
      "cell_type": "code",
      "source": [
        "dataset3 = pd.concat([df3_X,df3_Y], axis=1)\n",
        "dataset3"
      ],
      "metadata": {
        "colab": {
          "base_uri": "https://localhost:8080/",
          "height": 174
        },
        "id": "LUrRRezayMph",
        "outputId": "17f9a5f7-6ec9-421e-caa1-f7e0b163b5f0"
      },
      "execution_count": 41,
      "outputs": [
        {
          "output_type": "execute_result",
          "data": {
            "text/html": [
              "\n",
              "  <div id=\"df-f76d5263-b304-42a8-ad75-1c3ff91e3a9f\">\n",
              "    <div class=\"colab-df-container\">\n",
              "      <div>\n",
              "<style scoped>\n",
              "    .dataframe tbody tr th:only-of-type {\n",
              "        vertical-align: middle;\n",
              "    }\n",
              "\n",
              "    .dataframe tbody tr th {\n",
              "        vertical-align: top;\n",
              "    }\n",
              "\n",
              "    .dataframe thead th {\n",
              "        text-align: right;\n",
              "    }\n",
              "</style>\n",
              "<table border=\"1\" class=\"dataframe\">\n",
              "  <thead>\n",
              "    <tr style=\"text-align: right;\">\n",
              "      <th></th>\n",
              "      <th>PubchemFP0</th>\n",
              "      <th>PubchemFP1</th>\n",
              "      <th>PubchemFP2</th>\n",
              "      <th>PubchemFP3</th>\n",
              "      <th>PubchemFP4</th>\n",
              "      <th>PubchemFP5</th>\n",
              "      <th>PubchemFP6</th>\n",
              "      <th>PubchemFP7</th>\n",
              "      <th>PubchemFP8</th>\n",
              "      <th>PubchemFP9</th>\n",
              "      <th>PubchemFP10</th>\n",
              "      <th>PubchemFP11</th>\n",
              "      <th>PubchemFP12</th>\n",
              "      <th>PubchemFP13</th>\n",
              "      <th>PubchemFP14</th>\n",
              "      <th>PubchemFP15</th>\n",
              "      <th>PubchemFP16</th>\n",
              "      <th>PubchemFP17</th>\n",
              "      <th>PubchemFP18</th>\n",
              "      <th>PubchemFP19</th>\n",
              "      <th>PubchemFP20</th>\n",
              "      <th>PubchemFP21</th>\n",
              "      <th>PubchemFP22</th>\n",
              "      <th>PubchemFP23</th>\n",
              "      <th>PubchemFP24</th>\n",
              "      <th>PubchemFP25</th>\n",
              "      <th>PubchemFP26</th>\n",
              "      <th>PubchemFP27</th>\n",
              "      <th>PubchemFP28</th>\n",
              "      <th>PubchemFP29</th>\n",
              "      <th>PubchemFP30</th>\n",
              "      <th>PubchemFP31</th>\n",
              "      <th>PubchemFP32</th>\n",
              "      <th>PubchemFP33</th>\n",
              "      <th>PubchemFP34</th>\n",
              "      <th>PubchemFP35</th>\n",
              "      <th>PubchemFP36</th>\n",
              "      <th>PubchemFP37</th>\n",
              "      <th>PubchemFP38</th>\n",
              "      <th>PubchemFP39</th>\n",
              "      <th>...</th>\n",
              "      <th>PubchemFP842</th>\n",
              "      <th>PubchemFP843</th>\n",
              "      <th>PubchemFP844</th>\n",
              "      <th>PubchemFP845</th>\n",
              "      <th>PubchemFP846</th>\n",
              "      <th>PubchemFP847</th>\n",
              "      <th>PubchemFP848</th>\n",
              "      <th>PubchemFP849</th>\n",
              "      <th>PubchemFP850</th>\n",
              "      <th>PubchemFP851</th>\n",
              "      <th>PubchemFP852</th>\n",
              "      <th>PubchemFP853</th>\n",
              "      <th>PubchemFP854</th>\n",
              "      <th>PubchemFP855</th>\n",
              "      <th>PubchemFP856</th>\n",
              "      <th>PubchemFP857</th>\n",
              "      <th>PubchemFP858</th>\n",
              "      <th>PubchemFP859</th>\n",
              "      <th>PubchemFP860</th>\n",
              "      <th>PubchemFP861</th>\n",
              "      <th>PubchemFP862</th>\n",
              "      <th>PubchemFP863</th>\n",
              "      <th>PubchemFP864</th>\n",
              "      <th>PubchemFP865</th>\n",
              "      <th>PubchemFP866</th>\n",
              "      <th>PubchemFP867</th>\n",
              "      <th>PubchemFP868</th>\n",
              "      <th>PubchemFP869</th>\n",
              "      <th>PubchemFP870</th>\n",
              "      <th>PubchemFP871</th>\n",
              "      <th>PubchemFP872</th>\n",
              "      <th>PubchemFP873</th>\n",
              "      <th>PubchemFP874</th>\n",
              "      <th>PubchemFP875</th>\n",
              "      <th>PubchemFP876</th>\n",
              "      <th>PubchemFP877</th>\n",
              "      <th>PubchemFP878</th>\n",
              "      <th>PubchemFP879</th>\n",
              "      <th>PubchemFP880</th>\n",
              "      <th>bioactivity</th>\n",
              "    </tr>\n",
              "  </thead>\n",
              "  <tbody>\n",
              "    <tr>\n",
              "      <th>0</th>\n",
              "      <td>1</td>\n",
              "      <td>1</td>\n",
              "      <td>1</td>\n",
              "      <td>0</td>\n",
              "      <td>0</td>\n",
              "      <td>0</td>\n",
              "      <td>0</td>\n",
              "      <td>0</td>\n",
              "      <td>0</td>\n",
              "      <td>1</td>\n",
              "      <td>1</td>\n",
              "      <td>1</td>\n",
              "      <td>1</td>\n",
              "      <td>0</td>\n",
              "      <td>1</td>\n",
              "      <td>1</td>\n",
              "      <td>1</td>\n",
              "      <td>0</td>\n",
              "      <td>1</td>\n",
              "      <td>0</td>\n",
              "      <td>0</td>\n",
              "      <td>0</td>\n",
              "      <td>0</td>\n",
              "      <td>1</td>\n",
              "      <td>0</td>\n",
              "      <td>0</td>\n",
              "      <td>0</td>\n",
              "      <td>0</td>\n",
              "      <td>0</td>\n",
              "      <td>0</td>\n",
              "      <td>0</td>\n",
              "      <td>0</td>\n",
              "      <td>0</td>\n",
              "      <td>0</td>\n",
              "      <td>0</td>\n",
              "      <td>0</td>\n",
              "      <td>0</td>\n",
              "      <td>0</td>\n",
              "      <td>0</td>\n",
              "      <td>0</td>\n",
              "      <td>...</td>\n",
              "      <td>0</td>\n",
              "      <td>0</td>\n",
              "      <td>0</td>\n",
              "      <td>0</td>\n",
              "      <td>0</td>\n",
              "      <td>0</td>\n",
              "      <td>0</td>\n",
              "      <td>0</td>\n",
              "      <td>0</td>\n",
              "      <td>0</td>\n",
              "      <td>0</td>\n",
              "      <td>0</td>\n",
              "      <td>0</td>\n",
              "      <td>0</td>\n",
              "      <td>0</td>\n",
              "      <td>0</td>\n",
              "      <td>0</td>\n",
              "      <td>0</td>\n",
              "      <td>0</td>\n",
              "      <td>0</td>\n",
              "      <td>0</td>\n",
              "      <td>0</td>\n",
              "      <td>0</td>\n",
              "      <td>0</td>\n",
              "      <td>0</td>\n",
              "      <td>0</td>\n",
              "      <td>0</td>\n",
              "      <td>0</td>\n",
              "      <td>0</td>\n",
              "      <td>0</td>\n",
              "      <td>0</td>\n",
              "      <td>0</td>\n",
              "      <td>0</td>\n",
              "      <td>0</td>\n",
              "      <td>0</td>\n",
              "      <td>0</td>\n",
              "      <td>0</td>\n",
              "      <td>0</td>\n",
              "      <td>0</td>\n",
              "      <td>8.5</td>\n",
              "    </tr>\n",
              "  </tbody>\n",
              "</table>\n",
              "<p>1 rows × 882 columns</p>\n",
              "</div>\n",
              "      <button class=\"colab-df-convert\" onclick=\"convertToInteractive('df-f76d5263-b304-42a8-ad75-1c3ff91e3a9f')\"\n",
              "              title=\"Convert this dataframe to an interactive table.\"\n",
              "              style=\"display:none;\">\n",
              "        \n",
              "  <svg xmlns=\"http://www.w3.org/2000/svg\" height=\"24px\"viewBox=\"0 0 24 24\"\n",
              "       width=\"24px\">\n",
              "    <path d=\"M0 0h24v24H0V0z\" fill=\"none\"/>\n",
              "    <path d=\"M18.56 5.44l.94 2.06.94-2.06 2.06-.94-2.06-.94-.94-2.06-.94 2.06-2.06.94zm-11 1L8.5 8.5l.94-2.06 2.06-.94-2.06-.94L8.5 2.5l-.94 2.06-2.06.94zm10 10l.94 2.06.94-2.06 2.06-.94-2.06-.94-.94-2.06-.94 2.06-2.06.94z\"/><path d=\"M17.41 7.96l-1.37-1.37c-.4-.4-.92-.59-1.43-.59-.52 0-1.04.2-1.43.59L10.3 9.45l-7.72 7.72c-.78.78-.78 2.05 0 2.83L4 21.41c.39.39.9.59 1.41.59.51 0 1.02-.2 1.41-.59l7.78-7.78 2.81-2.81c.8-.78.8-2.07 0-2.86zM5.41 20L4 18.59l7.72-7.72 1.47 1.35L5.41 20z\"/>\n",
              "  </svg>\n",
              "      </button>\n",
              "      \n",
              "  <style>\n",
              "    .colab-df-container {\n",
              "      display:flex;\n",
              "      flex-wrap:wrap;\n",
              "      gap: 12px;\n",
              "    }\n",
              "\n",
              "    .colab-df-convert {\n",
              "      background-color: #E8F0FE;\n",
              "      border: none;\n",
              "      border-radius: 50%;\n",
              "      cursor: pointer;\n",
              "      display: none;\n",
              "      fill: #1967D2;\n",
              "      height: 32px;\n",
              "      padding: 0 0 0 0;\n",
              "      width: 32px;\n",
              "    }\n",
              "\n",
              "    .colab-df-convert:hover {\n",
              "      background-color: #E2EBFA;\n",
              "      box-shadow: 0px 1px 2px rgba(60, 64, 67, 0.3), 0px 1px 3px 1px rgba(60, 64, 67, 0.15);\n",
              "      fill: #174EA6;\n",
              "    }\n",
              "\n",
              "    [theme=dark] .colab-df-convert {\n",
              "      background-color: #3B4455;\n",
              "      fill: #D2E3FC;\n",
              "    }\n",
              "\n",
              "    [theme=dark] .colab-df-convert:hover {\n",
              "      background-color: #434B5C;\n",
              "      box-shadow: 0px 1px 3px 1px rgba(0, 0, 0, 0.15);\n",
              "      filter: drop-shadow(0px 1px 2px rgba(0, 0, 0, 0.3));\n",
              "      fill: #FFFFFF;\n",
              "    }\n",
              "  </style>\n",
              "\n",
              "      <script>\n",
              "        const buttonEl =\n",
              "          document.querySelector('#df-f76d5263-b304-42a8-ad75-1c3ff91e3a9f button.colab-df-convert');\n",
              "        buttonEl.style.display =\n",
              "          google.colab.kernel.accessAllowed ? 'block' : 'none';\n",
              "\n",
              "        async function convertToInteractive(key) {\n",
              "          const element = document.querySelector('#df-f76d5263-b304-42a8-ad75-1c3ff91e3a9f');\n",
              "          const dataTable =\n",
              "            await google.colab.kernel.invokeFunction('convertToInteractive',\n",
              "                                                     [key], {});\n",
              "          if (!dataTable) return;\n",
              "\n",
              "          const docLinkHtml = 'Like what you see? Visit the ' +\n",
              "            '<a target=\"_blank\" href=https://colab.research.google.com/notebooks/data_table.ipynb>data table notebook</a>'\n",
              "            + ' to learn more about interactive tables.';\n",
              "          element.innerHTML = '';\n",
              "          dataTable['output_type'] = 'display_data';\n",
              "          await google.colab.output.renderOutput(dataTable, element);\n",
              "          const docLink = document.createElement('div');\n",
              "          docLink.innerHTML = docLinkHtml;\n",
              "          element.appendChild(docLink);\n",
              "        }\n",
              "      </script>\n",
              "    </div>\n",
              "  </div>\n",
              "  "
            ],
            "text/plain": [
              "   PubchemFP0  PubchemFP1  PubchemFP2  ...  PubchemFP879  PubchemFP880  bioactivity\n",
              "0           1           1           1  ...             0             0          8.5\n",
              "\n",
              "[1 rows x 882 columns]"
            ]
          },
          "metadata": {},
          "execution_count": 41
        }
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "Nj_ugDTJ3WVh"
      },
      "source": [
        "# You may have noticed that we now have a very long bit of 882 descriptor set, but when we built the model earlier, there are much less descriptors. This mismatch will be addressed later."
      ]
    },
    {
      "cell_type": "code",
      "source": [
        "dataset3.to_csv('sars-cov-2-local_bioactivity_data_2class_pIC50_pubchem_fp.csv', index=False)"
      ],
      "metadata": {
        "id": "oM3yBN5Ayg-1"
      },
      "execution_count": 24,
      "outputs": []
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "nFpLoNRHeRa6"
      },
      "source": [
        "# **Let's download the CSV file to your local computer for the Part 3B (Model Building).**"
      ]
    }
  ],
  "metadata": {
    "colab": {
      "collapsed_sections": [],
      "name": "3_build.ipynb",
      "provenance": []
    },
    "kernelspec": {
      "display_name": "Python 3",
      "language": "python",
      "name": "python3"
    },
    "language_info": {
      "codemirror_mode": {
        "name": "ipython",
        "version": 3
      },
      "file_extension": ".py",
      "mimetype": "text/x-python",
      "name": "python",
      "nbconvert_exporter": "python",
      "pygments_lexer": "ipython3",
      "version": "3.8.5"
    }
  },
  "nbformat": 4,
  "nbformat_minor": 0
}